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Research during the past several years has led to the development 
of methods for the preparation and validation of crlterion-reiferenced 
tests (Hambleton & Elgnor, 1979; Millman, 1974; Popham> 1978) • On the 
other hand, very little attention has been paid to the reporting and 
tnierpreting of the scores of the tests. For example, in two recent re- 
views of the criterion-referenced testing field (Hambleton, Swaminathan, 
Algina, & Coulson, 1978; Fopham, 1978a) only a few sentences were devoted 
to the topics. The likely explanation is that measurement specialists have 
Hpent their time researching topics which precede logically the reporting 
and interpreting of test scores (for example, sorting out definitional 
problems, preparing methods for assessing content validity, assessing test 
score reliability, and determining test lengths). 

It is imfortunate» however^ that reporting and interpreting test scores 
h«ivo not received more attention. The purpose of a testing program is» 
alter alU to provide usable information in a convenient format. Test 
Siuie Information that is Inappropriate* confusing» or in any otht^r way 
unsuited to the needs of potential test score users will be of limited value. 



^f^boratory^ o f Psychome tric and Evahiative Research Repc rt No> 100. 
Amherst, MA: School of Education* University of Massachusetts, 1979. 

'a paper presented at the annual meeting of the Northe.-^stern Educa- 
tional Research Association, EUenville, NY, 1979. 



The quality and appropriateness of criterion-referenced test 
score reporting Impacts directly on the extent of the use of tost 
score Information. Presently* there are millions of students taking 
criterion-referenced tescs and they are at all levels of education. 
The decisions made from the results of the tests range from diagnosis 
of learning deficiencies and monitoring student progress in objectives- 
based programs to program evaluation and funding decisions. Many of 
those decisions have potential long-term implications for examinees. 

A 

It la Imperative, therefore, that the information provided to decision- 
makers, be the appropriate type of information and that it be in a 
format which facilitates effective decision-making. 

One might be tempted to suggest that reporting fotms developed 
over the years for use with norm-re ferenc.ed test scores with minor 
revisions could suffice. However, two reasons exist to explain the 
inappropriateness of using norm-referenced tes score repotting 
practices. First, as will be discussed later, there are a large 
number of problems associated with current methods, for reporting norm- 
referenced test score information. For example, saall differences 
between scores are often over-emphasized by test users even when confi- 
dence bands of performance are reported. Second, the nature of the 
statements tc he made about examinees is fundamentally different with 
criterion-referenced tests. Norm- referenced tests are constructed, 
principally, to facilitate comparisons amonjs, individuals (or groups) 
In relation to the pertormance of a norm group. Criterion-referenced 
nests, on the other hand, are developed to facilitate the interpreta- 
tion of individual (or group) test performance in relation to a ^ 
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of objectives or competencies (Hambleton & Eignor, 1979). It is hardly 
surprising that approaches to reporting and using test scores will 
differ considerably since the primary purpose of criterion-referenced 
tests is different from norm-referenced tests. 

The areas of criterion-referenced test score reporting and 
utilization require study since (1) little direct research has been 
carried out, (2) norm-referenced test score reporting technology is of 
limited value, and (3) the use of criterion-referenced tests has reached 
major proportions. The purposes of this study, therefore, are to 
(1) review the literature related to reporting, interpreting .and 
utilizing test results in the areas of criterion-referenced ar.d ncrm- 
referenced testing, (2) determine the qualities of a good report, and 
(3) provide guidelines for use in the development and evaluation of 
reports. The focus of this paper will be on two types of reporting 
systems. We are interested in the systems that accompany: 

• criterion-referenced tests, 

• combinations of criterion-referenced and norm-referenced tests. 
Wo will not oonoern ourselves in this paper with statewide testing 
programs or programs which are solely concerned with reporting group 
inlonnatlon (for example, the National Assessment of Educational Pro^rtss). 

The rt-m^iinder of this paper is divided into four sections. 
Necessary antecedents to the preparation of high quality reports are 
considered in one section. The other three sections correspond to the 
three purposes of the study. 
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Review of Literature 
Since proper reporting and Interpretation of scores Is so 

A 

important, test developers and practitioners need to have at th^lr 
disposal standards by which they can develop and evaluate reports 
of test scores. Although general guidelines exist for interpreting and 
reporting norm-referenced test scores, a great deal of dissatisfaction 
can be found for the manner in which test scores are used and inter- 
preted. Page (1977) has said, "On one general problem of testing we 
can b« in faxr agreement: A great gap exists between the expertise 
of test development and the amateurish use of test scores" (p. 8-9). 
Fisher (1978) notes that there is no shortage of trained personnel to 
handle technical problems in the areas of test development, administra- 
tion, ^nd scoring. But, according to Fisher, "The problems begin when 
these results are communicated to various audiences, and the problems 
get st?rlou8 when scmsone attempts to assign meaning to the data" 
(p. 35). Lewis (1977) reports that the "United Parents Organization 
rocviiTirnonded that Boards of Education set policies requiring principals 
to prt'iumt and Interpret test results in a comprehensible way to parents 
(p. 17-18). Pophara (1978b), while praising some aspects of the 
report InK system of the California Assessment iProgram notes "a certain 
h.iU -he.it tedness in the explanatory documents that accompany CAP results 
{p. :10). Hagen (1977) has also expressed concern: 

Those of us who have devoted our professional 
lives to testing and evaluation need to pay much 
more attention to translating test scores into 
constructive actions. Many of us have been too 
concerned with the predictive validity of tests 
and have been too lltMe concerned with what test 



scores mean In terms of behavior and constructive 
actions to be taken to facilitate the development 
of the Individual. We need to work more closely 
with teachers and educators to determine what 
information they need in order to make education 
more effective and help them get this information 
in a form which is useful to them. (p. 167) 

Clearly, problems exist in the translation of test scores into 
useful decisions. Both norm-referenced and criterion-referenced tests 
are criticized with respect to adequacy of the reporting systems. Six 
areas of concern with respect to providing test score information to 
Interested parties are described in the educational literature. The 
areas are: 

1. Uses of test scores. 

2. »Manner of reporting scores. 

3. Limited testing knowledge among teachers, parents, and students. 

4. Presentation of results to parents and students. 

5. Test score interpretation difficulties. 

6. Use of computer technology to report test scores. 

In tho remainder of this review of literature each of the six areas will 
bo briefly considered. 

Usfs of IVsi So ore 8 

Stftz (1478) lirtts five major uses of tost results: prodio.iion, 
diagnosis, research, program evaluation, and assessment of achievement. 
The stated purposes of a testing program will determine to a large 
extent the use of the scores. However, the audiences for which the 
scores are intended Is also an important consideration. Individuals 
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requiring information are students and their families » teachers ^ and 
administrators. Although some information will be desired by all 
groups 9 each group has unique information needs* 

Students and their families need test information for predicclve 
and diagnostic applications as well as for assessment of achievement 
(Stetz, 1978)* The predictive application of test results may be 
used to chose appropriate curricula and to make educational and voca-- 
tional choices (Goslin, 1967; Kirby, Culp, & Kirby> 1973) • The 
diagnostic use of test results to identify an individual's strengths 
and weaknesses and to develop strategies for improvement or remediatioi 
is often cited (Bradley, 1978; Goslin, 1967; Hagen, 1977). When test 
results are used for determining achievement, the focus may be on 
achievement during a school year (Goslin, 1967), relative achievement 
in several different subjects or areas, or perforn:ance in one subject 
or area over time (Gardner, 1977)* 

Teachers may use test results as aids in making decisions about 
students and in evaluating their instruction* Examples of declsioi\s 
al>otit students are decisions about grouping and placement (Wahlstrom, 
l)«inlov i Raphael, 1977) and group and individual diagnoses (Rost, 1.973). 
Kvaluation of instruction includes teacher ssel f -assessment (Wahlstrom, 
Ro>:an v\ Jones, Hi78), curricular reform (Kost, 1973; Wahlstron, Danloy 
& Raphael, 1977) and the appropriateness of the difficulty of course 
objectives (Wahlstrom, .Regan & Jones, 1978). 



At the local level a variety of administrative uses of test 
results are possible. Test results can be used to compare the perform- 
ance of a school 01 system to national norms (Gardner^ 1977; Wahlstrom^ 
Danley & Raphael^ 1977) • Comparison uf schools within a system may 
help identify patterns of achievement over time (Gardner, 1977) or to 
Identify probleu; schools which may need additional resource personnel 
(Wahlstrom, Danley, S Raphael, 1977)* Other administrative uses of 
test results include program evaluation and curriculum development 
(Goslln, 1967; Lawson & Ward, 1976; Stetz, 19/8; Wahlstrom, Danley 
& Raphael, 1977) and the evaluation of teacher effectiveness (Goslin, 
1967). 

Mann e r o f Re porting Scores 

The type of statement made from test results is dependent upon 
the strategy utilized for measuring achievement* Ahmann (1978) and 
Millman (1978) list three strategies: item-centered, objective- 
centered, and subtest-centered. Therefore, test score reports can be 
centered around items, objectives, or subtests* 

In an item-centered approach. Information is presented about performance 
on Ocirh item in a test. Such a strategy can be employed to provide 
information about group performance on specific skills (reflected by 
single test itoms). It would not usually be advisable, however, to make 
statements related to individual examinee performance on an objective 
on the basis of performance on a single test item due co the unre." in- 
ability of information provided by a single t€ist iLem* 
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Objective-centered reporting involves milking a stateAent about an 
individual's or group's performance on an objective on the basis of 
several items which measure that objective « A test may Include many 
objectives and have many items per objective* Multiple test items/ 
objective increases the reliability of reported examinee objective 
scores » but decreases the breadth of coverage of a test^ xinJess 
testing time is increased* 

Subtest-centered reporting usually Involves a small number of skills » 
but many items measuring each skill* The skills are typically not exten- 
sively defined. The scores derived on the various components of a 
typical norm- referenced test (mathematics concepts » problem-solving, and 
computations; vocabulary, reading comprehension) are examples of subtest- 
centered reports. 

A fourth strategy also exists. Millman (1970) suggests that two 
lists of objectives may be useful. One list would be for teachers. 
This list would Include specific objectives to be measured. The second 
Itst^ which would be for parents, would Include broader objectives. 
Millman is suggesting that teachers receive information about perform- 
ance on each objective whereas parents could receive information about 
clusters of related objectives. For example, Millman suggests that 
teachers may need information on objectives such as 'Hdentif Icatlon of 
coins and converting coins to equivalent amounts of other coin values.*' 
* Parents^ on the other hand, might be confused by data on a large number 
ot small objectives. It would be better to provide them with informa- 
tion such as ^'understands the dollar value of money" (p. 227). 

9 
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Limited Testing Knowledge 

One reason that teachers often misuse or misinterpret test scores 
is because they are unfamiliar with the field of te^ts aixd measurements* 
Goslln (1967) found that less than 40 percent of all teachers have 
had more than one course in test and measurement techniques. Many 
teachers have had no exposure to test and measurement techniques in 
either formal classes or in-service training* Over 50 percent of all 
elementary and private secondary school teachers (in the 1960 ^s at least) had 
no formal test and measurement training* It is not surprising therefore to 
find that many teachers do not understand or properly use test rfesults* 
The call for in-service and pre--service training to upgrade teacher 
competencies in interpreting and utilizing test results is widespread 
(for example, Dunn, 1969; 71eming, 1971; Lewis, 1977; Rost, 1973)* 

In most cases, parents are also lackiag in competence in interpreting 
■md understanding test results* Wahlstrom, Danley, and Raphael (1977) 
found most educators reluctant to present raw results to parents duet 
to the perception that parents are unable to properly interpret the 
results. It was felt that parents might place too much emphasis and 
meaning on the scores* Others who h-^ve expressed concern about over- 
emphasis ot test scores include Anastasl (1971), Backman (1976)^ 
and Brown (1976) • 
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Pre sentation of Results to Parents and Students 

One topic that has received much attention Is the presentation of 
results to students* Since test results contain Information of great 
potential benefit it Is Important that students receive the results In 
the best way possible* The results affect not oniy the studenu's 
Intellectual response^ but also his or her emotional response. 

There is general agreement that reports of test results should 

be presented in face~to-face interpretive Interview sessions (Backman^ 

1976; Bradley, 1978; Kirby, Gulp, & Klrby, 1973; Miller, 1977; Thorndike 

& Hagen, 1977). In some situations. It would seem that group Interpretive 

sessions are as ehec tlve as individual sessions (Folds & Gazda, 1966; 

Lallas, 1956; Rubensteln, 1978; Wtlght, 1963) • Walker (1965) found 

Individual sessions and group sessions equally effective when students 

were asked to recall scores, but that individual sessions lead to more 

acceptance of scores by examinees. While group sessions may be useful 

tor explaining the concept of error in scores or for other general 

0 

explanatory purposes, the potential effects of the scores on the student 
and his/her parents requires individual interpretive sessions. 

Anastasi (1971) indicates a pi^eference for the use of broad levels 
ot performance and qualitative descriptions over numerical scores. Fur- 
ihor, HcorvH "should be accompanied by Interpretative explanations* by a 
professionally trained person" (p. 56), Backman (1^76) also recommends 
reporting bands instead of numerical scores to reduce the chance of 
overemphasizing small differences in scores. 

ii 
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Test results should be interpreted in light of other information 

available about the examinee. Thorndike and Hagcn (1977) make the 

following statement about a test interpretation: 

It should be set in the frame of reference of 
the particular student. Test scores should be 
interpreted in terms of what is known about 
the student's aptitude and about his educational 
or vocational goals. It should be directed toward 
positive and constructive action. It should 
emphasize the assets in a test profile or it 
should be oriented toward remedial action when 
achievement falls below what aptitude would lead 
one to expect, (p. 578) 

The incorporation of background information is considpred to be 

iinpartant in reporting student grades as well as in the interpretation 

of the results of standardized tests (Performance printouts for parents, 1974). 

For example, a computer reporting system in Memphis allows teachers 

to include anectodal information about the students in their reports 

to parents. Teachers select, from a list of statements stored on a 

computer, those which apply to each child. The printouts of the 

statemt>nts are sent hone after the report cards to provide parents 

with descriptive Information about their child's performance and 

work habits. The quotation i.bove suggests that a similar system ml^;ht 

iiv vlesir.ihlc when test results are reported as well. 

Bradluy (197H), Miller ( 1977), and Kirby, Gulp, and Kirby d'Hl) 
iiUK^iest that test results be discussed with the examinee in light ol 
his/her fe^elinf'.s on the day of the test and with reference to personal 
ohararterist ics. Bradley and Kirby, et al., sugi>est that the quality 
oi a test siure is enhanced when the examinee may see some or all ot 
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the test items* In many caeest hoirever^ It is not desirable to release 
actual test items* Popham (1978a^ 1978b) recommends that the detailed 
statements of objectives measured by a test (called ^'domain specifica- 
tions'*) be available upon request, but that^ they not be widely distri- 
buted. The length and detail of domain specifications render them too 
complex to be of use to most individuals* Popham recommends the use 
of 'descriptive abstracts'' which draw on those aspects of the test 
specifications which are directly rele\rant to instruction. 

Test Score Interpretation Difficulties 

Five major uses of tests and the corresponding types of tests ^ 
which are most useful are listed in Table 1* It is clear from the 
table that many testing programs will require both norm-referenced 
and criterion-referenced interpretations* If €hia is the case, several 
options are available. * It is possible to use a norm-referenced test 
with both norm-referenced and criterion-referenced interpretations. 
Most norm-referenced testa^ however* do not have objectives or domains 
stated with sufficient specificity to allow for "strong" criterion- 
referenced interpretations (i.e., inferences cannot be made safely from 
examinee performance on a set of test items measuring an objective to a 
large class of behaviors defined by the objective). Also, the manner in 
which items are selected for inclusion in a norm-referenced test (dele-* 
t Ion of Items which are too easy or too difficult for an examinee group) 
does not facilitate criterion-referenced interpretations. A second 
option is to use a criterion-referenced test with both norm-referenced 
and criterion-referenced interpretations. The problem here is that norms 
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Table 1 



Use of Test 
Prediction 

Diagnosis 



Eval ti.it i on 



\ 



AJ6*He8stiu*nt of 
K hievomefit 



The Major Uses of Tests ^ Their Purposes > and the Appropriate 
Type of Test Needed to Accomplish Each Purpose 



Purpose 



Differentiate among Individuals on the basis 
of an ability or a trait • 

Determine what a particular individual can and 
cannot do* 

Determine the relationship among variables 

Compare performance in experimental and 
non-oxperimental groups on well--def ined tasks* 

Determine extent \o which instruction has been 
effective in reaching program goals, 

l)etenuine achlevenK*nt relative to that of oth* r 
programs. 

ii. on program objectives 
b» on global measures 

Determine competence of students after 
Inst ruction. 

Determine relative achievtMiuMU t)} individuals 
after inst met ion . 



Type of Test Needed 



Norm-referenced 



Criterion-referenced 



Norm-referenced 



Criterion-referenced 



Cr 1 te r 1 on- re f erenced 



Criterion-referenced 
Norm- r e f c renced 

Criterion-referenced 



Norm- ref erenced 
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for criterion- referenced tests tend to be somewhat unstable for 
individual interpretations* Another alternative is to administer both 
a criterion-referenced test and a norm-referenced test* This approach 
may take more time and money than the others. On the other hand» the 
combined quality of NRIs and CRIs is apt to be better. A fouilh possi- 
bility is the use of a single test battery that has a norm-referenced 
component and a criterion-referenced component. Such a battery allows 
for the best in criterion-referenced test development to be used in one 
section and the best in norm-referenced development to be used in the 
other. Users need not make psychometric sacrifices in order to obtain 
both criterion- referenced and norm-referenced interpretations. The 
primary advantages of approach four over approach three are the consist- 
ency of format and approach in the two tests and the availability of 
data (usually) on the Inter-relationshlp of scores from the two tests. 

U se of Computer Technology 

The growth of computer technology has simplified the task of 
scoring tests and preparing reports ♦ Baker (1971) points out that 
the speed and accuracy of the computer allows the use of various scoring 
keys with a single test to provide analyses beyond those usually pro- 
vided. He also advocates the use of detailed verbal descriptions of 
student performance. Lewis (1977) has said that computer scoring allows 
publishers to provide much more information than is currently provided 
and to provide It in such a way as to help teachers diagnose learning 
problems. 



Nichols and Knopf (1977) have listed several advantages of 
cotnputerlzed score Interpretations beyond those mentioned above* Such 
systems are faster and less expensive than Individual systems* They can 
be used by persons not trained In test interpretation and are less 
subject to misinterpretation when read by different people than rav 
or standard scores* 

Most available computed scoring systems do not, however, take 
full advantage of the available options. Furlong and Miller (1978) 
have pointed out that many scoring programs only provide students with 
reports of items missed and identification of the correct response 
for those items* Such reports do not provide information about an 
individual's performance relative to course objectives. Furlong and 
Miller (1978) describe a computer scoring program which provides 
Individual reports of (1) an individual's performance relative to 
other students taking the test, (2) incorrectly answered items and 
correct responses to those items» (3) the objectives to which incorrectly 
answered items refer, and (4) if the instructor desires, alternative 
material., which may be used for further study* The program allows 
Instructors to receive summaries of performance by item and by object ive. 
The report also summarizes performance by taxonomic level of object ives* 

It is clear that computer-generated reports and irterpretat ions 
otter a promising, but as yet unfilled, alternative to traditional 
reports* They allow for a variety of scoring schemes, matching of 
report formats to audience needs and reduction of score misinterpretation. 
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Prerequlsltes for Appropriate Criterion-Referenced 
Test Score Reporting Systems 

A test score report which provides needed information to several 

groups of Interested parties is the product of much work. Several 

activities must occur before high quality reports can be prepared. The 

activities are listed below: 

- Specification of Information Needs, 

~ Building a Testing Program Consistent with Needs, 

- Identification of Audiences and Levels of Sophistication, 

- Proper Test Selection, 

~ Proper Test construction. 
The purpose of this section is to briefly discuss these necessary ante- 
cedents to the preparation of appropriate reports* 

Specif Ic&L ion o f Information Needs 

A school system should clearly specify the groups who are to be 
served by a testing program and what (specifically) their information 
needs are. From there, it is possible for a school district to formally 
state its purposes* Along the way> school subjects, course objectives, 
and grade levels which will be Involved in testing should be specified. 

Buil ding a Testing Program Consistent with Needs 

Two points are of concern. A test characteristic mentioned by 
Popham (1978a) is an adequate number of items per measured behavior. 
Although the number of items desired is not usually specified, a general 
Idea of the relative emphasis to be placed on each domain of content 



should be available* Tha other conalderatlon Is the scope of the 
prosraa. A good testing program should provide data in relation to 
(at least) the most important information needs. It is icq)ortant» 
however, that each testing situation include enough items per objective 
to yield reliable ireasurement of the objectives of interest at that 
time without requiring excessive amounts of testing time. It is 
necessary, therefore, to design a testing program which provides reli-- 
able and valid data on objectives and which presents the data In a 
manner which will not confuse the audiences or overload them with Infor-- 
mat Ion which is either too specific or too general for their purposes. 

Identification of Audiences and Levels of Sophistication 

Decisions must be made about the types of information needed and 
the people who will receive the information* The lack of sophistication 
of teachers and parents in the field of tests and measurements has been 
discussed previously. This information should be considered when a 
school system determines the manner in which information will be pre- 
sented to the various audiences. Specific statements of reporting 
goals at this stage can ease the burden of test selection and dissemi- 
nation of results. 

P roper Test Selection 

If the purposes of the program have been adequately clarified as 
outlined above, test selection is considerably easier. The task is to 
Identify the available tests which come closest to matching: 

-the curricular emphases. 
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— -the scope and focus » and 

— Che informational requirements of the program. 

The task is not to Identify the one test that exactly matches 
the program specifications. Such a search would in all probability b* 
fruitless. The task is to identify a number of tests which come 
close to meeting the exact requirements. When the process of test selec- 
tion is undertaken, one of the available guidelines for selection will 
be beneficial (Hambleton f» Eignor, 1978; APA, 1974). 

Proper Test Constructio n 

One essential consideration is that of test development. A system 
which is considering purchasing a commercial test will find the previously 
mentioned selection guidelines helpful in assessing a test (Hambleton 
& Eignor, 1978) • Some systems will want to develop their own criterion- 
referenced tests. Such a situation necessitates the availability of staff 
with both test development training and the time to do the job (Hambleton 
& Eignor, 1979) • 
Sum mary 

The purpose of this section has been to emphasize the importance 
of several prerequisites of appropriate reporting systems. Although 
the fulfillment of these prerequisites will not guarantee the prepara- 
tion of a high quality report, the failure to meet them will almost 
certainly Insure low quality reports due to inappropriate or in- 
accurate data. The prerequisites are therefore necessary but not 
sufficient to Insure the preparation of high quality reports of 
test results. The characteristics of reporting systems which do meet 
the needs of the several audiences interested in receiving test results 
^ and interpretations are considered in the next section. 

ERIC 
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Characteristics of Appropriate Test Score 
Reporting Svatens 

In this section eleaents and options are discussed which should 
be available In reports of criterion-referenced test scores. A logical 
analysis of the potential of criterion-referenced tests* current uses , 
of the tests, and Information needs of various audiences were used to 
generate reconmendatlons reported In this section. Four audiences are 
addressed: 

— Teacher?, 

— Parents and Students, 

— Building Administrators, and 

— District Administrators. 
Table 2 provides a listing of the four audiences ^ the information to be 
reported^ and the rationale for providing the information* Several of 
the Information needs require explanation beyond that provided in the 
table* These needs are noted with a numerical superscript* Explanations 
are found in Appendix A. In a final section, four important character- 
istics which apply to all reports will be discussed. 




Table 2 



Audience 



Teachers 



Audiences Desiring Test Results, Their Information Needs and 
Examples of Uses of the Information Provided 



Information Needs 



Master list of objectives tested. 

Information keying items to objectives 
and objectives to clusters. 

Individual s^^udent data by objective 
including raw scores and cut-off 
scores. 

Individual student data by objective 
cluster. 

Diagnostic statements of errors of 
non-masters. 

Performance of Individuals on previous 
tests of the same or related objectives. 

Identification by objective of all 
KtudcntH wIjo were classified as masters 
and thoKc classified as non-masters. 

Summary class data for each objective. 

Idont ifl:*at fon of objectives on which 
perf ormanee of the class was low* 

Summary clnj^s data for each cluster 
of <>b jert i ve«. 



Use of the Information 



Provide general comparison of test 
and curricular match 

Provide specific comparisons of 
class activities and test« 

Identification of specific individual 
deficiencies and the degree of 
remediation necessary 

Identification of general areas of 
individual deficiencies. 

Aid the design of instructional 
activities to upgrade performance. 

Identify treftds in individual strengths 
and weaknesses « 

Devise grouping patterns for new 
instruction and/or remediation. 

Identification of specific Instructional 
and/or curricular deficiencies 

Self-evaluation of instruction and 
determination of needs for group 
remediation. 

Identification of general areas of 
instructional and/or curricular 
deficiencies. 
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Audience 



Information Needs 



Previous performance of the class on the 
same or related objectives* 

Previous performance of students in 
classes taught by this teacher on the 
same or related object ives» 

Performance of other classes at the 
same instructional level in the system* 

Performance of other classes at the 
same instructional level in the state . 
or nation (optional) ♦ ' 

Parents and Students^ Performance on clusters of objectives* 

Identification of specific objectives . 
on which performance is low* 

Inclusion of sample items from non~ 
mastered objectives* 

Identification of trends of porfor- 
manire acro»'>s tests or subtests* 

Performance from previous tests on 
the same or related objectives. 

Performance relative to other students 
in the same class. 

Performance relative to other students 

* I 

at the same instructional level In the 

Perf i>rmaiu <^ relative to other students 
at the same Instructional level in t})e 
state or nation (optional). 

Pertormance in relation to aptitudes. 



Use of the Information 



Identify trends in class performance. 

Identify trends in effectiveness 
of curriculum and/or instruction* 

Determine performance of the class 
relative to performance in the system* 

Determine performance of the class 
relative to state or national 
performance* 

Provide general overview of performance^* 
Determine specific deficiencies* ^ 

N> 

Clarification of skills to be mastered. 

Identification of strengths and weaknesses 
In broad areas of performance. 

Identification of trends of improvement 
or decline. 

Determine relative standing in the class. 

Determine relative standing in the system. 



Determine reUttive performance as com-- 
pared to a national sample. 

Determine if student is performing to 
his/her potential. 



Audience Information Needs 



Narrative diagnostic and Interpretive 
reports to supplement numerical 
summaries « 

Statement from an official of the 
school system; (if desired by system)^ 

Comments from student's teacher 



BuildlnR Admin ts- ^j^Summaries of subtest performance for 
trators each classroom 

Summaries of subject performance for 
each classroom. 



Summaries of subtest performance by 
grade for the school. 

Identification, for each subtest, of 
clusters of objectives on whicli per- 
formance was low. 

Summaries of subject purf orman<*i» by 
jjrade for the school. 

Summari(*s of subject performance by 
grade on previous tests of the same 
object ives. 

Sut.'marli'S of stiulent performatu<» on 
key 9b j e/ti^l ves . '* 

Sumfuari«»s of studeni performance^ hv 
Kra<le for other <iistrlrt schools. 




Use of the Information 



Reduce misunderstanding of scores, 
provide alternative views of the data, 
identify areas needing attention. 

Explain some aspect of the testing 
program • 

Provide background information to 
enhance Interpretation of scores. 

Identification of classes which may 
need specific remediation. 

Identification of classes which may 
need general remediation. Determine 
the need for added personnel or In- 
service in a subject. 

Identify trends of performance. 

Indicate the need for currlcular and/or 
instructional revision. 

Identify subjects in m rd of currioular 
and/or Instructional revision or 
Increased resources. 

Identify areas of improvement of decline. 



Monitor progress on school or di.strivt 
priorities. 

Comparison an<i Identification of specific 
strengths and weaknesses. Identification 
of trends in the district. ^ 



Audience 



Information Needs 



Use of the Information 



District 

Administrators 



Summaries of subject perfonnance by grade 
for other district schools. 

Master list of objectives and percentage 
of students classified as masters in 
each school and the district* 

Individual permanent record labels. 

Performance by grade relative to state 
or national norms for each subtest 
(optional). 

Summaries of subtest performance in 
each grade by school. 

Summaries of subject perfonnance In 
each grade by school . 



Summaries of subtest performance in 
each grade for the district. * 

Summaries of subject performance In 
each grade for the district* 

Identification, for subtest » those 

schools in whiclr perf onnance was low. 

Stimmaries of subject performance by 
grade on previoiis tests of the same 
tib)e< t fvos. 

Master list of objectives, number of 
it**ms pt*r objective, cut-off scores, 
niul porit»ntagi» of students in tho 
(i ist r j et eX( i»edi 11^; t he ( ut-of f siori*. 

S\inimar if s of stuili-iit pi»r f t>rm«'mi'e t>u 
kt»v i>h jfct i ves « 



Com(t>arison with other schools on a 
general basis. Identification of 
general performance trends. 

Reference and comparison. 



Student files. 

Determine relative standing of classes 
of students. 



Determine achievement levels. 

Determine schools in need of additional 
resources (financial or special 
personnel) . 

Public release. 
Public release* 
I)eti*rmlne In-servicc needs. 



Identify trends of Impr .>ven4ent or 
decline In the district. 



Ref erence. 



I 
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MiMiitor progress on school or district 
priorit les. 
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Audience 



Information Needs 



Use ot the Information 



••Split" summaries of subject performance 
by designated subgroups (race, sex» etc.)* 

Normative data of subtest performance 
relative to the state or nation. 

Computer tapes containing •'raw'* data 
of student performance* 



Public release » reports to government 
officials. 

Comparison of achievement with other 
districts. 

Research studies within the system* 



-25- 

Important Characteristics for All Reports 

There are a number of characteristics of report forms which are 
important, regardless of audience. They are: 

1. Physical considerations 

2. Reporting normative information 

3. Flexibility of cut-off scores 

4. Gjeneralizability of the test scores. 

Each of the characteristics will be considered next. 

« 
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Physical Conalde rat Ions . The size of the report can be a problom. 
Small reports are easy to lose or damage. Large reports are cumbersome 
and hard to store In standard folders or notebooks* Therefore* reports 
should be printed on standard 8^" x 11" paper. Each page should list 
the audience to receive the report » a date> the information Included on that 
page» and the exam ee or group of examinees covered by the Information. 
Reports should arrive in the format in which they will be distributed. ^ 
That is» school personnel should not be required to fold» cut or paste 
reports for t'le different audiences. 

Whenever possible » all information pertaining to one test or 
subtest should be Includisd on a single page. This eliminates the 
need for referring back and forth between pages to make comparisons. 
Attempts to provide all data on one page should not^ however, forsake 
legibility; sufficient space should be allowed between columns and 
rows of scores to allow easy reading. Reports which have alternating 
rows or columns of shaded and nonshaded background facilitate legi- 
bility* Narrative passages should be within a page of the tables to 
which they refer if they cannot be included on the same page. Not only 
does this keep related information together^ but it also separates 
tables of numbers from one another which improves the ease of reading 
the report. 

Reporting Normative Information . If the tests include a norm-^ 
referenced component » the norm- referenced information for a test should 
be Included with the criterion-referenced Information for the same .test. 
Norm-referenced information for all tests or subtests should not be 
grouped together on a separate sheet. To do so Invites confusion and 

o 
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misinterpretation* Norm-referenced interpretations should always be 
reported as bands of numbers or as numbers including error terms* 
(Eightieth to ninetieth percentile or eighty«*f ifty percentile t 
five percentiles*) 

Flexibility of Cut-off Scores . One quality that greatly enchances - 
the value of the reports is allowing school systems to choose the 
cut-off score for each objective* Since schools place different Importance 
on different objectj(.ves it is reasonable to assume that students would be 
expected to perform better on some objectives than others* School systems 
could receive Instructions on procedures which could be used to choose 
an appropriate cut-off acore for each objective. Altem^tely^ It is 
possible to provide school systems with a list of objectives and three 
possible cut-off scores for each objective which could be chosen to 
reflect the level of Importance placed on the domain at a certain grade 
level In the system. 

Generallgabillty of the Test Scores . Many of the tests which are 
currently called criterion-referenced tests are more accurately described 
by the term objective-referenced tests . The difference Is an Important 
one* An objective-referenced test Is one In which items are keyed to 
behavioral objectives. The scores on such a test reflect an examinee's 
ability on those Items which make up the test. A criterion-referenced 
test> on the other hand. Is composed of Items which represent a sample 
from a well-defined content or behavior domain. Such a test allows 
an examinee's score to be Interpreted not only in relation to the items 
on the test, but also In relation to the entire domain of behavior 
sampled by the test« It Is the latter Interpretation that is most 
often desired (so much so, that often such Interpretations are 
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made even vhen the donain of behavior has not been specified). A teat 
score report s))ould inclxide a section describing the general izability 
of the test scores. Failure to provide such infomation invites over- 
or under- interpretation of the scores. 

Suanaary. The elements which should be found in reports of test 
scores have been briefly considered. Four different atidiences were 
considered: teachers, parents and students, building administrators, 
and higher level administrators. Each audience has different needs 
and should receive reports which address those needs. Several char- 
acteristics were discussed which apply to all reports. These include 
physical considerations, placement of norm-referenced information, 
flexibility of cut-off scores and generalizability of the test scores. 
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Guldellnes for Evaluating Score Reporting Systems 
This section of the paper provides questions which can be used' 
to evaluate or guide the development of reports of criterion-referenced 
test results. The questions are broken Into s.ix sections reflec|:lng: 

1. Audiences to whom reports should be provided. / 

2. Components of teacher's reports. 

3. Components of reports received by parents. 

4. Components of reports received by building administrators. 

5. Components of reports received by higher level administrators. 

6. General considerations for all reports. 

All questions are worded positively, that is» if the report is in 
line with recommendations of the previous section, the answer to the 
question would be yes. It is suggested, however, that yes-no responses 
not be used. Instead, answers should be "S", "E", or "N". "S" 
would indicate that the information is provided as part of the standard 
reporting package of the test. "E" indicates that the information can be 
provided, but that an extra charge is involved. "N" is used to indicate 
that the information or service is not available. 

l_. Audiences 

1.1 Are reports available for classroom teachers? 

1.2 Are individual student reports available 

for students and their parents? 

1.3 Are reports available for building 

administrators? 

1.4 Are reports available for higher level 
administrators such as superintendents 

and their assistants? 
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2. Teacher Reports 

2.1 Are all objectives or (domains) measured by 
the test listed? 

2*2 Are the Items which represent each domain 
Identified? 

2*3 Is the total n*«aber of Items measuring each 
objective clearly defined? 

2.4 Is the cut-*off score which was used to assign 
examinees to mastery states on each objective 
Identified? 



2*5 Is the raw score (or percent score) of each 
child on each domain printed? 

2,6 Are students who have been classified as 
masters Identified for each objective? 

1.1 Is summary data on class performance available 
for each objective (average percent scores)? 

2*8 Are clusters of related objectives Identified? 

2*9 Is performance of each student on each of the 
clusters provided? 

2.10 Is summary data of class performance on each 
of the clusters provided? 

2.11 Are individuals whose performance is sub- 
standard listed for each objective? 

2*12 Are diagnostic statements available about 
the errors of each examinee? 




2.13 Are objectives identified on which total class 
performance was relatively low? 

2»I4 Is information pertaining to individuals' 

previously identified strengths and weaknesses 
provided (after the first year)? 

2«15 Is information pertaining to strengths and 
weaknesses Identified in previous classes 
taught by the teacher provided (after the 
first year)? 

2*16 Are summaries of performance of other classes 
at the same instructional level in the system 
available for each cluster of objectives? 

2*17 Are summaries of performance of other classes 
at the same instructional level in the state 
or nation available (optional)? 3' 
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3. Parent and Student Reports 

3.1 Is performance reported for each cluster of 
objectives? 

3.2 Within each cluster are the objectives on which 
performance was substandard Identified? 

3.3 Are example items Included in the identification of 
objectives in which performance was substandard?. 

3*4 Are common sources of errors which occur across 
tests or subtests identified? 

3.5 Are Improvements or declines in performance from 
previous test administrations noted (after the 
first year)? 

3.6 Is performance reported in relation to aptitudes? 

3.7 Is the typical performance of other students 
in the same class identified for each cluster 
of objectives? 

3.8 Is the typical performance of other students 
at the same Instructional level in the system 
identified for each cluster of objectives? 

3.9 If norms are reported are they reported as 
bands rather than specific percentile ranks? 

3.10 Are diagnostic statements Included which refer 
to objectives in which performance was low? 

3.11 Is it possible for a ^andard statement from 
the superintendent (or another official) to 
be Included In the report? 

3.12 Is there a section of the report which Includes 
comments about each child which teachers have 
chosen from a list of standard statements? 



4. Building Administrator Reports 

4.1 Are summaries of performance on each subtest 
available for each classroom? 

4.2 Are sucomarles of performance on each subtest 
available by grade? 

4.3 Are summaries of performance on each subtest 
available for other schools in the district? 




4.4 



Ar« summaries of subject performance 
available for each classroom? 



4.5 Are summajrles of subject performance available 
by grade? . , 

4«6 Are sumaarles of subject performance available 
for other schools in the district^ 

4.7 Are summaries of past performance of each school 
provided for each subtest (after the first year)? 

4.8 For each sub test > are clusters of objectives 
identified on which performance in the 
system was low? 

4*9 Is a master table which identifies school 
performance on all objectives provided? 

4.10 Are individual scores provided in a manner 
which facilitates placing them in permanent 
student record files? 

4.11 Are summaries of student performance on 
key objectives available? 

4.12 Are norms reported for use in judging^the 
school against others in the state or nation? 



5. Higher Level Administrator Reports 

^.I Are summaries of subtest performance 
available by grade for each school? 

3*2 Are summaries of subject performance 
available by grade for each school? 

S.i Are district summaries of subtest 
performance available? 

3.4 Are district summaries of subject performance 
available? 

3.3 Are schools which perform poorly identified 
for each subtest? 

3.6 Are results of previous tests of the same 
objectives available by grade for each subject? 

3.7 Is a master list of objectives provided? 

5.8 Is the number of Items for each objective 
listed? 
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5*9 Are cut«»off scores Included? 

5.10 Are sunmarles of percent nasters In the 
district provided for each objective? 

5.11 Is information provided which relates to 
student performance on designated key 
objectives? 

5.12 Are split** summaries of performance of 
designated subgroups available (by race, 
sex> etc*)? 

5*13 Is normative data provided? 

5»14 Is a computer tape of "raw" student data 
available? 



f>. General Considerations 

6.1 Are all reports on 8V* x 11" paper? 

6.2 Does each page of the report identify the 
audience to receive the report? 

6*3 Does each page of the report identify the 
information on that page? 

6*4 Does each page of the report Identify the 
examinee or group of examinees for which 
information is provided on that page? 

6.5 Is the test data clearly identified on each 
page of the report? 

6.6 Is all information about one test or subtest 
included on the same page whenever possible? 

6. 7 Are rows and columns of numbers well spaced 
or placed on backgrounds of different shades 
to facilitate legibility? 

6.8 Are narrative passages within one page of the 
numerical information to which they refer? 

6.9 If norm-referenced information is reported, is 
the information included with relevant 
criterion-referenced information? 

6*10 Is norm-referenced information always reported 
as a band or as a number with an error term 
provided? 




6*11 Are systems able to chose a cut*»off score for 

each objective In order to allow local curricular 
emphasis to Influence mastery decisions? 

6*12 Are reports provldec* In a form which does not 
require system personnel to further prepare 
the reports before dissemination (cutting* 
folding* pasting)? 

6.13 Is a section of the report devoted to a 
discussion of the generallzability of the 
test scores? 



Summary 

It is clear that current reporting systems for use with criterion^ 
referenced tests are not satisfactory* In this paper we have discussed 
the relevant literature and the qualities necessary in a high quality 
report. Also, we have provided a set of guidelines for reporting 
systems. At least two tasks lie ahead. Firsts using the guidelines 
presented here, examples of high quality reporting systems should be 
prtrp^ired. These reports would serve as references for others as they 
develop reporting systems to accompany criterion-referenced testing 
programs. Second, Che guidelines presented In this paper should be used 
to evaluate many of the reporting systems which accompany currently 
available criterion-referenced tests. Such evaluations would be helpful 
lor school systems as they consider the selection of a testing system 
to provide necessary information for effective decision making. 
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Appendix A 

Notes to Accompany Table 2 

!• It should be noted that the data provided to parents is leas 
specific than the data provided to teachers. While teachers 
need specific information on every objective in order to devise 
instruct ^viial prescriptions, parents and students may only 
require information about performance on clusters of related 
objectives. A report of an Arithmetic computation test might 
include information about the student *s overall performance 
in addition, multiplication, subtraction, and division. Further 
subdivisions such as sutraction of whole numbers, subtraction of 
decimals, subtraction of fractions, etc., might provide too many 
sets of data for the parents. It would be better to provide an 
overall performance appraisal for subtraction and. then identify 
areas which need further work. 

For c/.ample, a child might answer 28 out of 35 subtraction problems 
correctly, but only correctly answer 3 out of 7 questions dealing 
with the subtraction of fractions. The report to the parent would 
say that the student had answered 80^ of the items which related 
to subtraction correctly, but that subtraction of fractions was 
an area where performance was low. Statements of objectives and 
example items for those areas which show less than adequate 
achievement should be included* 

2. Communi4:ation between schools and parents is often neglected. When 
reports of test results reach the parents they are often unaware of 
the purpose or scope of the testing program. A short statement 
from the superintendent or some other official would enhance the 
acceptance and understanding of the program and the scores reported. 

3. Teachers are in possession of a wealth of information which could 
enhance student and parent understanding of test score reports. 
Teachers could receive a coded li^Sut of statements concerning clas— 
room activities. Teachers could select, from the list, those 
statements which apply to. each student. The codes could be reco ded 
on the student's answer sheet. A computer program could then 
include the s tatements • in the individual reports. Statements could 
range from identification of objectives which have not yet been 
taught to statements pertaining to an individual's interest in a 
given subject area. 

4. Often a school or school district will choose a small number of key 
objectives on which to concentrate in a given year. The option 
should exist for a number of ob jec tives . ( 2-3 per subject area) to 
be classified as key objectives. Data on these key objectives 
should be presented to building administrators and to system 
administrators. 
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